Preprint typeset in JHEP style - HYPER VERSION 



hep-lat/0209145 



Locality and Statistical Error Reduction on Correlation 
Functions 



Harvey B. Meyer 

Theoretical Physics, University of Oxford, 1 Keble Road, 
Oxford, 0X1 3NP, United Kingdom 



Email: neyer@thphys.ox.ac.uk 



Abstract: We propose a multilevel Monte-Carlo scheme, applicable to local actions, which 
is expected to reduce statistical errors on correlation functions. We give general arguments 
to show how the efficiency and parameters of the algorithm are determined by the low- 
energy spectrum. As an application, we measure the euclidean-time correlation of pairs of 
Wilson loops in SU(3) pure gauge theory with constant relative errors. In this case the 
ratio of the new method's efficiency to the standard one increases as e m ° t/ ' 2 , where uiq is 
the mass gap and t the time separation. 
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1. Introduction 

In equilibrium statistical mechanics and quantum field theory, the physical information on 
the theory is encoded in n-point functions. The short-range nature of interactions in the 
former and the causality requirement in the latter case lead to the property of locality 
of the Hamiltonian (resp. action). While certain lower-dimensional systems have been 
solved analytically, Monte-Carlo simulations have become an invaluable tool in the study 
of interacting theories such as non-abelian gauge theories. In particular, the properties of 
the spectrum are extracted from numerically calculated 2-point functions in the euclidean 
formulation: 



This formula follows directly from the insertion of the complete set of energy eigenstates 
H\n) = E n \n). In theories with a mass gap, the exponential decay of each term singles out 
the lightest state compatible with the symmetry of the operator, thus enabling us to extract 
the low- lying spectrum of the theory. However, it is precisely this decay that makes the 
2-point function numerically difficult to compute at large t. Indeed, standard algorithms 
keep the absolute error roughly constant, so that the error on the local effective mass 
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increases exponentially with the time separation. For that reason, it would be highly 
desirable to have a more efficient method to compute correlations functions at large time 
separation t. The task amounts to reduce uncorrelated fluctuations between the two time 
slices separated by euclidean time t. 

In the context of non-abelian gauge theories, two types of techniques have proved 
useful: the first aims essentially at increasing the "signal". The widely used "fuzzing" 
algorithms, such as smearing [Q] and blocking j2| ; increase the overlap onto the lightest 
states. As the lattice spacing a is made much smaller than the correlation length £ the 
number of smearing/blocking steps as well as their weights have to be increased in order 
to maintain the quality of the overlap. 

The other type of technique aims at reducing the "noise", that is, the variance of 
the quantity being evaluated. For instance, when computing a Polyakov loop with the 
multihit technique Q, a link is replaced by its thermal average under fixed neighbouring 
links. Last year, Liischer and Weisz [|| demonstrated how to make even better use of the 
locality of the theory to exponentially improve the efficiency of the algorithm that computes 
Polyakov loop correlators. In that method, pairs of time-like links are averaged over in 
time slices of increasing width with fixed boundary conditions before they are multiplied 
together to form the loops. In this paper, we present a "noise reduction" method that 
exploits the locality property in a similar way. It has the advantage of being compatible 
with the popular link-smearing and -blocking: one can cumulate the advantages of both 
types of techniques. Indeed, while the fuzzing algorithms also help reduce short wavelength 
fluctuations inside a time-slice, till now no attempt has been made to tame the noise induced 
by fluctuations appearing in neighbouring time-slices. Our method aims at averaging out 
the latter by performing additional sweeps with fixed time-slices as boundary conditions; 
as the continuum is approached, the volume over which the average is performed ought to 
be kept fixed in physical units to maintain the efficiency of the algorithm. Finally, we note 
that the idea is very general and is expected to be applicable in other types of theories. 

Outline.- We first present the idea in its full generality, without reference to the specific 
form of the action or to the quantity being computed. We next formulate a multilevel 
scheme for the case of a 2-pt function and point out how the efficiency and parameters of 
the algorithm are determined by the low-lying spectrum of the theory. We finally apply 
this algorithm to the case of (3+1) 577(3) gauge theory. 

2. Locality, hierarchical formula and multilevel algorithm 
2.1 Locality 

The locality property of most studied actions allows us to derive an interesting way of 
computing correlation functions. First we give a general, 'topological' definition of locality 
in continuum field theories. For simplicity we shall use a symbolic notation; if C denotes a 
configuration, let X , y and A be mutually disjoint subsets of C. If tlx, an d & r e their 
(disjoint) supports on the (continuous) space-time manifold A4, suppose furthermore that 

e.g. defined by l/y/a, where a is the fundamental string tension 
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any continuous path 7 : / — ► M. joining ftx and f2y (i.e. 7(1) n &x 7^ and 7(1) PlOy 7^ 0) 
passes through Qa (he. 7(1) n SI a / 0). See fig. (|l|). 

The theory with probability distribution p(C) is local if there exists functionals pa and 
such that, for any setup with this topology, 

p(X,y) = J>(-4) p A {X) PA®) (2.1) 
A 

That is, "X and y influence each other only through A". This condition is obviously 
satisfied by continuum euclidean field theories whose Lagrangian density contains a finite 
number of derivatives. With a suitable notion of "continuity" of the path 7, one can extend 
this definition to lattice actions. For instance, the Wilson action is also local in this sense, 
but note that 0,x and Qy must be separated by more than one lattice spacing in order to 
realise the setup in the first place. 

2.2 Hierarchical formula 

As a consequence, for two operators O x and O y , functionals of X and y respectively, we 
have 

(O x O y ) = O x (C)O y (C) P (C) = ^ P (A) (O x ) A (Oy) A (2.2) 
C A 

where 

(O x ) A = ^pa(X) O x (X) 
x 

(Oy) A = Y.P^ y ) Oy{y) (2.3) 
y 

are the average values of the operators at a fixed value of A. Thus the averaging pro- 
cess factorises into an average at fixed "boundary conditions" and an average over these 
boundary conditions. There are several ways in which this factorisation can be iterated: 
first, if the operator O x = Xl X2 itself factorises, the decomposition can be carried out 
also at this level, where pa now plays the role of p. This means that the decomposition 
Q2.2p allows us to treat the general n-point functions in the same way as the n = 2 case 
that we shall investigate in more detail: each factor can be averaged over separately. 




Figure 1: The sets fix, and on the space-time manifold. 
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There is another way the decomposition can be iterated: we can in turn write (O x )a 
and (O v )a as factorised averages over yet smaller subspaces, thus obtaining a nested ex- 
pression for the correlation function. A three-level version of ( |2,2| ) would be 

(O x O y ) = ^p(A) x J2pa(Ai) ^2paM2)(O x )a 2 x 

A Ai A2 

Ai A 2 

This type of formula is the basis of our multilevel algorithm for the 2-point function. 
2.3 Multilevel algorithm for the 2-point function 

The hierarchical formula ( [2.4D is completely analogous to the expression derived in Q in 
the case of the Polyakov loop, where it was also proved that it can be realised in a Monte- 
Carlo simulation by generating configurations in the usual way, then keeping the subset A 
fixed and updating the regions X and y. Suppose we do N updates of the boundary and 
n measurements of the operators for each of these updates. We are thus performing N ■ n 
updates 2 . But because the two sums in ( |2.2| ) are factorised, this in effect achieves Nn 2 
measurements. As long as 

• the latter are independent; 

• that the fluctuations on the boundary Qa have a much smaller influence than those 
occurring in fix and fly; 

• that no phase transition occurs Q due to the small volume and the boundary condi- 
tions; 

error bars reduce with the computer time r like 1/r rather than 1/y/r '■ to divide the 
error by two, we double n. The fluctuations of the boundary are only reduced in the usual 
1/\/t regime. Thus, for a fixed overall computer time, one should tune parameters of the 
multilevel algorithm so that the fluctuations in X and 3^ are reduced down to the level of 
those coming from A. 

The one- level setup we shall use in practise is illustrated in fig. (Q): fix, fly and 
flA are time-slices. An indication of how many "submeasurements" should be chosen at 
each level is given by the following consideration: if we measure an operator located in 
the middle of a time-block (of width At) bounded by A, then any fluctuation occurring 
on A can be decomposed on the basis which diagonalises the Hamiltonian of the theory. 
If the lowest-lying state compatible with the symmetry of the operator being measured 
has mass mo and At > 1/m-o, that state will act as the main 'carrier' of the fluctuation, 
so that it will induce a fluctuation of relative magnitude e _m oAt/2 on ^ e time-slice where 
the operator is measured (see fig. 1); indeed the propagation of fluctuations is damped 
exponentially in a system that develops a mass gap. Thus it is worth performing roughly 

2 An update in general consists of several sweeps. 



- 4 - 



n ~ e m ° A< measurements 3 under fixed boundary A in order to reduce the fluctuations 
coming from X down to the level of those of A. In fact, this estimate is an upper bound, 
because the vacuum state under the fixed boundary conditions could lie at a higher energy 
level than the full-lattice vacuum. Finally, if zero modes are present, we expect a power 
law n oc (At) 7 !. 

At any rate, we need to optimise the parameters of the multilevel Monte-Carlo algo- 
rithm "experimentally" , but we shall see in a practical example that this order of magnitude 
estimate is in qualitative agreement with the outcome of the optimisation procedure. 

This simple argument also shows that the purpose of the multilevel scheme is to reduce 
fluctuations occurring at all separations (from the time-slice where the operator is mea- 
sured) ranging from to At/2 with an appropriate number of updates, in order to reduce 
their influence down to the level of the out-most boundary, which is then averaged over in 
the standard way. 

2.4 Error reduction 

Suppose the theory has a mass gap and that for a given t, the correlation function C(t) ~ 
e~ mt is determined with equal amounts of computer time with the standard algorithm and 
the multilevel one. If we now want to compute C(2t) with the same relative precision with 
the former, we need to increase the number of measurements by a factor e 2mt . With the 
multilevel algorithm however, we would e.g. introduce an extra level of nesting, so as to 
multiply the number of submeasurements by e mt , as explained in the preceding section. 
Thus in this situation, the gain in computer time of this method is a factor e mt ; turned the 
other way, it achieves an error reduction oc e~ m */ 2 compared to the standard algorithm 4 . 
Since the quantity determining the error reduction is the product mt, the error reduction 
is in first approximation independent of /?, as long as we measure the correlation function 
at fixed t in physical units. It would be inefficient to perform sweeps over time-blocks that 
are much thinner than the physical length scale: none of the three conditions highlighted 
in the preceding section is likely to be satisfied. 

3 These may be nested, in which case it is the total number of measurements under fixed boundary A 
that is meant here. 

4 Note however that the computer effort is still increasing exponentially with t. 




Figure 2: Our choice of fix, and to implement the hierarchical formula. 
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If we compare this to the error reduction analysis in H, we note a "conservation of 
difficulty" law: in Wilson loop calculations, the exponent in the error reduction factor is 
the area of the loop, while here it is the time separation of the two operators. 

3. The algorithm in practise 

3.1 Preliminaries 

We consider pure SU (3) lattice gauge theory in (3 + 1) dimensions and we use the standard 
Wilson action 

S[U]=0 (1-^ReTrP^) (3.1) 

X,fl<U 

where U are the link variables, Pn V is the plaquette and (3 = 

We define zero- momentum operators as a sum over a time-slice of Wilson loop traces: 

0(t) = J2TrW(x,t) (3.2) 

X 

We are interested in measuring correlation functions of these operators: 

(OHO) 0{t)) = i J D[U] Ot( )O(t) e -W D[U ] = Y[dU(x,f,) (3.3) 

The main physical information we extract from this numerical measurement is the mass 
of the lightest state compatible with the symmetry of O, by fitting the exponential decay 
of the correlation function. Numerically the path integral is evaluated by updating the 
lattice configuration according to the Boltzmann weight and evaluating the operators at 
each of these updates. Here we calculate the correlation function of two operators carrying 
the continuum quantum numbers ++ and 2 ++ , constructed with 4x2 rectangular Wilson 
loops. The simulation is done at (3 = 5.70 on an 8 4 lattice. For the update we use a 
mixture of heat-bath || and over-relaxation [0 in the ratio 1:4, both applied to three 
SU(2) subgroups || . Finally, as was noted in Q, the most elegant way to implement a 
multilevel Monte- Carlo program is to use a recursive function. 

3.2 Averages under fixed boundary conditions 

To illustrate our arguments on the way an error reduction is achieved, we measure the 
change of the average value of an operator carrying quantum numbers 2 ++ - whose ex- 
pectation value in the full lattice is known to be exactly zero - due to fluctuations on the 
boundary and the finite number of measurements n performed: 

G n (At) = ^p(A) \(0 2++ )^\ 

A 

Fig. ([|, top) shows the dependence on the width At of G, for several fixed values of n. The 
data was obtained by nesting averages on increasing widths of the time-slices. Here, to 
reduce the number of parameters, we set n to be the same for all time-slices. Clearly, the 
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data confirms that G(At) vanishes exponentially, as anticipated (this a posteriori justifies 
the choice of keeping n constant). The exponent is a function of n, i.e. of how carefully 
the average is done: on the bottom plot, we see how fast a good estimate of the average 
value is obtained as a function of n, for the two fixed widths 2 and 4. We see that at 
n = 20, an accurate estimate of the average value has been reached. For this "true" 
average G2o(At) ~ Goo(At), our estimate of the effective mass of the decay e~ nie ff / ^ t / 2 
between At = 2 and At = 4 is 1.22(2), and 1.77(20) between At = 4 and At = 6. This 
is compatible with the mass of the lightest 2 ++ glueball we shall calculate below. Thus 
it seems that, at large widths, Goo(At) does decay with a mass close to the lightest state 
sharing the symmetry of our operator. The question now is, when has a good enough 
estimate of this quantity been achieved so that statistical errors are maximally reduced? 

3.3 Optimisation procedure 

Since errors are reduced with the number of measurements as at large n, we plot 

(G n (At)) 2 x n and, at fixed At, choose n to minimise this quantity. Eventually this quantity 
must increase, because the "error" G tends to a constant at large n. 

We have already seen (fig. (||) right) that for the smallest width, the average is reached 
very smoothly. As a consequence, the optimal according to our criterion is small: ri2=5. 
If we now fix this parameter at that value and move to the width 4 block, fig. (Q) reveals 
that the corresponding optimisation curve reaches its minimum between = 70 and 90. 
Indeed, G reaches its asymptotic value around these values of n^. The data was obtained as 
an average over 30 boundary conditions. Between submeasurements, 5 update sweeps are 
performed. The error reduction achieved with slightly different parameters below shows 
that the success of the algorithm is not all too dependent on a fine-tuning of parameters. 

Finally, we note that to optimise the parameters for a ++ operator, whose expectation 
value in the full lattice does not vanish, we need to minimise the fluctuation around that 
value (which has to be known in advance). 

3.4 Results 

We compute the ++ and 2 ++ correlation functions formed with a 4x2 rectangular Wilson 
loop at 4 lattice spacings. We use two smearing steps on the operator. Errors are estimated 
with a standard jackknife analysis using 26 bins. A two-level scheme is implemented: the 
8 time-slices are split into 2 time-blocks of width 4, which are in turn decomposed into 
2 time-blocks of width 2. We restrict ourselves to the measurement of the correlation 
function at even time-separations. 

For the ++ correlation function at 4 lattice spacings, one "measurement" comprises 
10 submeasurements at the lower level, 40 at the upper level. When performing the latter, 
we are free to compute the and 2 lattice spacing correlation in the standard way (thus 
the error reduction is only applied to the t = 4a correlation). We collected 260 of these 
compound measurements. 

We proceed similarly in the 2 ++ case with following parameters: one "measurement" 
comprises 8 submeasurements at the lower level, 150 at the upper level, each of these being 
preceded by 5 sweeps; our program needs about 8.3 minutes on a standard alpha machine 
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to perform one of these compound measurements. We collected 520 of them; because they 
are time-consuming, we perform ~ 200 sweeps between them to reduce their statistical 
dependence. 

The following values for the correlation functions, as well as their corresponding local 
effective masses (extracted from a cosh fit), were obtained: 



t/a 


(Oj(0) O (t)) 


amfj f (t) 


(Ol(0) 2 (t)> 


(2) 

amy f (t) 





1.0000(65) 




1.0000(14) 




1 




1.017(35) 




2.151(75) 


2 


0.1331(99) 




1.36(20) x 10~ 2 




3 




0.929(49) 




1.794(74) 


4 


0.0406(39) 




7.49(70) x 10" 4 





Fig. (||) shows a plot of the two correlation functions. The small error bars appear to be 
roughly constant on the logarithmic scale. The t = 4a correlation of the 2 ++ operator shows 
that a factor 20 error reduction has been achieved with respect to the t = point. It is 
already at an accuracy that would be inaccessible on current single-processor machines with 
the standard algorithm. Indeed the latter yields error bars that are roughly independent of 
t and would have given an error comparable to our error on the t = data, where we do not 
achieve any error reduction. The naive error-reduction estimate of section 2A evaluates 
to exp(1.8 x 4/2) ~ 36. Not unexpectedly, the observed reduction factor is somewhat 
lower than this naive estimate, presumably because the configurations generated at fixed 
boundary conditions are quite strongly correlated. 



4. Conclusion 

In this paper we have proposed a general scheme in which the accuracy of numerically 
computed n-point functions in local field theories could be improved by the use of nested 
averages. While the procedure is known to be exact, the efficiency of the algorithm must 
ultimately be tested on a case-by-case basis. However, a simple application to SU(3) 
Wilson loop correlations showed that, in some cases at least, the multilevel algorithm 
drastically reduces statistical errors. It can straightforwardly be used in conjunction with 
the smearing and blocking techniques. A further nice feature of the 2-point function case is 
that previous knowledge of the low-energy spectrum provides useful guidance in the tuning 
of the algorithm's many parameters. Also, the error reduction achieved was roughly as 
anticipated. We intend to use multilevel algorithms to extract the higher-spin spectrum Q 
of the SU (3) gauge theory, where the higher masses involved and the successful use of the 
variational method require a high level of accuracy on the correlation functions. 

Independently of statistical errors, it is much harder to determine quantities extracted 
from n-point functions with n > 3; if these difficulties are overcome, the multilevel scheme 
could prove a decisive asset in those numerical measurements. 
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Figure 3: "Error" on the average value of the 2 ++ operator (4x2 rectangles), as a function of the 
width of the time-block (top) and the number of measurements (bottom) . 
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Figure 4: Optimisation curve for the width 4 level 
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Figure 5: The ++ and 2 ++ correlation functions 
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